δ-Tolerance Closed Frequent Itemsets

نویسندگان

  • James Cheng
  • Yiping Ke
  • Wilfred Ng
چکیده

In this paper, we study an inherent problem of mining Frequent Itemsets (FIs): the number of FIs mined is often too large. The large number of FIs not only affects the mining performance, but also severely thwarts the application of FI mining. In the literature, Closed FIs (CFIs) and Maximal FIs (MFIs) are proposed as concise representations of FIs. However, the number of CFIs is still too large in many cases, while MFIs lose information about the frequency of the FIs. To address this problem, we relax the restrictive definition of CFIs and propose the δ-Tolerance CFIs (δTCFIs). Mining δ-TCFIs recursively removes all subsets of a δ-TCFI that fall within a frequency distance bounded by δ. We propose two algorithms, CFI2TCFI and MineTCFI, to mine δ-TCFIs. CFI2TCFI achieves very high accuracy on the estimated frequency of the recovered FIs but is less efficient when the number of CFIs is large, since it is based on CFI mining. MineTCFI is significantly faster and consumes less memory than the algorithms of the state-of-the-art concise representations of FIs, while the accuracy of MineTCFI is only slightly lower than that of CFI2TCFI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Closed Itemsets with the Frequent Pattern List

The mining of the complete set of frequent itemsets will lead to a huge number of itemsets. Fortunately, this problem can be reduced to the mining of frequent closed itemsets (FCIs), which results in a much smaller number of itemsets. The approaches to mining frequent closed itemsets can be categorized into two groups: those with candidate generation and those without. In this paper, we propose...

متن کامل

CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...

متن کامل

The Algorithm of Mining Frequent Closed Itemsets Based on Index Array

The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. In this paper, an algorithm based on index array for mining frequent closed itemsets, Index-FCI is proposed. The vertical BitTable is adopted to compress the dataset for counting fast the support. To make use of the horizontal BitTable, the index array co...

متن کامل

Using attribute value lattice to find closed frequent itemsets

Finding all closed frequent itemsets is a key step of association rule mining since the non-redundant association rule can be inferred from all the closed frequent itemsets. In this paper we present a new method for finding closed frequent itemsets based on attribute value lattice. In the new method, we argue that vertical data representation and attribute value lattice can find all closed freq...

متن کامل

Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions

The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006